Dataset statistics
| Number of variables | 16 |
|---|---|
| Number of observations | 264960 |
| Missing cells | 248517 |
| Missing cells (%) | 5.9% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 32.3 MiB |
| Average record size in memory | 128.0 B |
Variable types
| NUM | 11 |
|---|---|
| CAT | 5 |
Reproduction
| Analysis started | 2020-11-16 03:25:57.343899 |
|---|---|
| Analysis finished | 2020-11-16 03:27:02.412987 |
| Duration | 1 minute and 5.07 seconds |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
Zipcode has a high cardinality: 33120 distinct values | High cardinality |
State has a high cardinality: 51 distinct values | High cardinality |
City has a high cardinality: 14740 distinct values | High cardinality |
Metro has a high cardinality: 861 distinct values | High cardinality |
CountyName has a high cardinality: 1759 distinct values | High cardinality |
med_hIncome is highly correlated with Year and 3 other fields | High correlation |
Year is highly correlated with med_hIncome and 3 other fields | High correlation |
uspop_growth is highly correlated with int_rate | High correlation |
int_rate is highly correlated with uspop_growth | High correlation |
unemplt_rate is highly correlated with Year and 3 other fields | High correlation |
newHouse_starts is highly correlated with Year and 3 other fields | High correlation |
resConstruct_spending is highly correlated with Year and 3 other fields | High correlation |
RentPrice has 19907 (7.5%) missing values | Missing |
SizeRank has 27056 (10.2%) missing values | Missing |
State has 27056 (10.2%) missing values | Missing |
City has 27056 (10.2%) missing values | Missing |
Metro has 83008 (31.3%) missing values | Missing |
CountyName has 27056 (10.2%) missing values | Missing |
HomePrice has 37378 (14.1%) missing values | Missing |
Zipcode is uniformly distributed | Uniform |
Vacancy_Rate% has 14908 (5.6%) zeros | Zeros |
| Distinct count | 33120 |
|---|---|
| Unique (%) | 12.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.0 MiB |
| 17021 | 8 |
|---|---|
| 73662 | 8 |
| 95616 | 8 |
| 02646 | 8 |
| 00983 | 8 |
| Other values (33115) |
| Value | Count | Frequency (%) | |
| 17021 | 8 | < 0.1% | |
| 73662 | 8 | < 0.1% | |
| 95616 | 8 | < 0.1% | |
| 02646 | 8 | < 0.1% | |
| 00983 | 8 | < 0.1% | |
| 95060 | 8 | < 0.1% | |
| 79088 | 8 | < 0.1% | |
| 65715 | 8 | < 0.1% | |
| 68928 | 8 | < 0.1% | |
| 30464 | 8 | < 0.1% | |
| Other values (33110) | 264880 | > 99.9% |
Length
| Max length | 5 |
|---|---|
| Median length | 5 |
| Mean length | 5 |
| Min length | 5 |
| Distinct count | 153859 |
|---|---|
| Unique (%) | 62.8% |
| Missing | 19907 |
| Missing (%) | 7.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1067.691442092119 |
|---|---|
| Minimum | 19.960000000000036 |
| Maximum | 5620.320000000002 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 2.0 MiB |
Quantile statistics
| Minimum | 19.96 |
|---|---|
| 5-th percentile | 581.736 |
| Q1 | 781.625 |
| median | 942.446 |
| Q3 | 1204.816 |
| 95-th percentile | 1960.804 |
| Maximum | 5620.32 |
| Range | 5600.36 |
| Interquartile range (IQR) | 423.191 |
Descriptive statistics
| Standard deviation | 491.6269767 |
|---|---|
| Coefficient of variation (CV) | 0.4604579163 |
| Kurtosis | 13.77100961 |
| Mean | 1067.691442 |
| Median Absolute Deviation (MAD) | 195.09 |
| Skewness | 2.826552991 |
| Sum | 261640991 |
| Variance | 241697.0842 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 681.736 | 600 | 0.2% | |
| 731.736 | 552 | 0.2% | |
| 631.736 | 536 | 0.2% | |
| 1281.736 | 528 | 0.2% | |
| 831.736 | 503 | 0.2% | |
| 581.736 | 491 | 0.2% | |
| 781.736 | 490 | 0.2% | |
| 1006.736 | 487 | 0.2% | |
| 881.736 | 366 | 0.1% | |
| 1106.736 | 340 | 0.1% | |
| Other values (153849) | 240160 | 90.6% | |
| (Missing) | 19907 | 7.5% |
| Value | Count | Frequency (%) | |
| 19.96 | 7 | < 0.1% | |
| 94.96 | 8 | < 0.1% | |
| 103.29 | 1 | < 0.1% | |
| 133.95 | 1 | < 0.1% | |
| 139.4 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 5620.32 | 5 | < 0.1% | |
| 5619.795 | 2 | < 0.1% | |
| 5616.46 | 3 | < 0.1% | |
| 5563.03 | 2 | < 0.1% | |
| 5558.206 | 125 | < 0.1% |
| Distinct count | 8 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2014.5 |
|---|---|
| Minimum | 2011 |
| Maximum | 2018 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 2.0 MiB |
Quantile statistics
| Minimum | 2011 |
|---|---|
| 5-th percentile | 2011 |
| Q1 | 2012.75 |
| median | 2014.5 |
| Q3 | 2016.25 |
| 95-th percentile | 2018 |
| Maximum | 2018 |
| Range | 7 |
| Interquartile range (IQR) | 3.5 |
Descriptive statistics
| Standard deviation | 2.291292171 |
|---|---|
| Coefficient of variation (CV) | 0.001137399936 |
| Kurtosis | -1.238095957 |
| Mean | 2014.5 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0 |
| Sum | 533761920 |
| Variance | 5.250019814 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 2018 | 33120 | 12.5% | |
| 2017 | 33120 | 12.5% | |
| 2016 | 33120 | 12.5% | |
| 2015 | 33120 | 12.5% | |
| 2014 | 33120 | 12.5% | |
| 2013 | 33120 | 12.5% | |
| 2012 | 33120 | 12.5% | |
| 2011 | 33120 | 12.5% |
| Value | Count | Frequency (%) | |
| 2011 | 33120 | 12.5% | |
| 2012 | 33120 | 12.5% | |
| 2013 | 33120 | 12.5% | |
| 2014 | 33120 | 12.5% | |
| 2015 | 33120 | 12.5% |
| Value | Count | Frequency (%) | |
| 2018 | 33120 | 12.5% | |
| 2017 | 33120 | 12.5% | |
| 2016 | 33120 | 12.5% | |
| 2015 | 33120 | 12.5% | |
| 2014 | 33120 | 12.5% |
| Distinct count | 11054 |
|---|---|
| Unique (%) | 4.6% |
| Missing | 27056 |
| Missing (%) | 10.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15646.706806106666 |
|---|---|
| Minimum | 0.0 |
| Maximum | 34430.0 |
| Zeros | 8 |
| Zeros (%) | < 0.1% |
| Memory size | 2.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1503 |
| Q1 | 7531 |
| median | 15164 |
| Q3 | 23514 |
| 95-th percentile | 31180 |
| Maximum | 34430 |
| Range | 34430 |
| Interquartile range (IQR) | 15983 |
Descriptive statistics
| Standard deviation | 9424.124602 |
|---|---|
| Coefficient of variation (CV) | 0.6023072279 |
| Kurtosis | -1.122786021 |
| Mean | 15646.70681 |
| Median Absolute Deviation (MAD) | 7971.5 |
| Skewness | 0.1321156224 |
| Sum | 3722414136 |
| Variance | 88814124.52 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 29964 | 288 | 0.1% | |
| 30545 | 280 | 0.1% | |
| 32062 | 280 | 0.1% | |
| 28097 | 248 | 0.1% | |
| 29685 | 248 | 0.1% | |
| 30892 | 248 | 0.1% | |
| 28904 | 248 | 0.1% | |
| 29439 | 240 | 0.1% | |
| 30504 | 232 | 0.1% | |
| 28784 | 232 | 0.1% | |
| Other values (11044) | 235360 | 88.8% | |
| (Missing) | 27056 | 10.2% |
| Value | Count | Frequency (%) | |
| 0 | 8 | < 0.1% | |
| 1 | 8 | < 0.1% | |
| 2 | 8 | < 0.1% | |
| 3 | 8 | < 0.1% | |
| 4 | 8 | < 0.1% |
| Value | Count | Frequency (%) | |
| 34430 | 184 | 0.1% | |
| 34322 | 128 | < 0.1% | |
| 34302 | 24 | < 0.1% | |
| 34272 | 8 | < 0.1% | |
| 34258 | 8 | < 0.1% |
| Distinct count | 51 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 27056 |
| Missing (%) | 10.2% |
| Memory size | 2.0 MiB |
| TX | 14080 |
|---|---|
| NY | 13584 |
| CA | 13304 |
| PA | 13048 |
| IL | 10152 |
| Other values (46) |
| Value | Count | Frequency (%) | |
| TX | 14080 | 5.3% | |
| NY | 13584 | 5.1% | |
| CA | 13304 | 5.0% | |
| PA | 13048 | 4.9% | |
| IL | 10152 | 3.8% | |
| OH | 9280 | 3.5% | |
| FL | 7544 | 2.8% | |
| MI | 7504 | 2.8% | |
| MO | 7432 | 2.8% | |
| IA | 7384 | 2.8% | |
| Other values (41) | 134592 | 50.8% | |
| (Missing) | 27056 | 10.2% |
Length
| Max length | 3 |
|---|---|
| Median length | 2 |
| Mean length | 2.102113527 |
| Min length | 2 |
| Distinct count | 14740 |
|---|---|
| Unique (%) | 6.2% |
| Missing | 27056 |
| Missing (%) | 10.2% |
| Memory size | 2.0 MiB |
| New York | 1368 |
|---|---|
| Houston | 856 |
| Los Angeles | 800 |
| San Antonio | 448 |
| Chicago | 440 |
| Other values (14735) |
| Value | Count | Frequency (%) | |
| New York | 1368 | 0.5% | |
| Houston | 856 | 0.3% | |
| Los Angeles | 800 | 0.3% | |
| San Antonio | 448 | 0.2% | |
| Chicago | 440 | 0.2% | |
| Springfield | 432 | 0.2% | |
| Dallas | 416 | 0.2% | |
| Columbus | 408 | 0.2% | |
| Kansas City | 400 | 0.2% | |
| Philadelphia | 392 | 0.1% | |
| Other values (14730) | 231944 | 87.5% | |
| (Missing) | 27056 | 10.2% |
Length
| Max length | 30 |
|---|---|
| Median length | 8 |
| Mean length | 8.448641304 |
| Min length | 3 |
| Distinct count | 861 |
|---|---|
| Unique (%) | 0.5% |
| Missing | 83008 |
| Missing (%) | 31.3% |
| Memory size | 2.0 MiB |
| New York-Newark-Jersey City | 7416 |
|---|---|
| Chicago-Naperville-Elgin | 3056 |
| Los Angeles-Long Beach-Anaheim | 2896 |
| Philadelphia-Camden-Wilmington | 2832 |
| Washington-Arlington-Alexandria | 2552 |
| Other values (856) |
| Value | Count | Frequency (%) | |
| New York-Newark-Jersey City | 7416 | 2.8% | |
| Chicago-Naperville-Elgin | 3056 | 1.2% | |
| Los Angeles-Long Beach-Anaheim | 2896 | 1.1% | |
| Philadelphia-Camden-Wilmington | 2832 | 1.1% | |
| Washington-Arlington-Alexandria | 2552 | 1.0% | |
| Pittsburgh | 2544 | 1.0% | |
| Boston-Cambridge-Newton | 2208 | 0.8% | |
| Dallas-Fort Worth-Arlington | 2112 | 0.8% | |
| Houston-The Woodlands-Sugar Land | 1888 | 0.7% | |
| Minneapolis-St. Paul-Bloomington | 1840 | 0.7% | |
| Other values (851) | 152608 | 57.6% | |
| (Missing) | 83008 | 31.3% |
Length
| Max length | 42 |
|---|---|
| Median length | 9 |
| Mean length | 12.30697464 |
| Min length | 3 |
| Distinct count | 1759 |
|---|---|
| Unique (%) | 0.7% |
| Missing | 27056 |
| Missing (%) | 10.2% |
| Memory size | 2.0 MiB |
| Washington County | 2848 |
|---|---|
| Jefferson County | 2600 |
| Los Angeles County | 2200 |
| Franklin County | 2120 |
| Montgomery County | 2120 |
| Other values (1754) |
| Value | Count | Frequency (%) | |
| Washington County | 2848 | 1.1% | |
| Jefferson County | 2600 | 1.0% | |
| Los Angeles County | 2200 | 0.8% | |
| Franklin County | 2120 | 0.8% | |
| Montgomery County | 2120 | 0.8% | |
| Jackson County | 1864 | 0.7% | |
| Orange County | 1760 | 0.7% | |
| Marion County | 1456 | 0.5% | |
| Wayne County | 1408 | 0.5% | |
| Monroe County | 1408 | 0.5% | |
| Other values (1749) | 218120 | 82.3% | |
| (Missing) | 27056 | 10.2% |
Length
| Max length | 29 |
|---|---|
| Median length | 14 |
| Mean length | 13.07388285 |
| Min length | 3 |
| Distinct count | 220057 |
|---|---|
| Unique (%) | 96.7% |
| Missing | 37378 |
| Missing (%) | 14.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 184668.82838348375 |
|---|---|
| Minimum | 10421.83 |
| Maximum | 6141945.92 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 2.0 MiB |
Quantile statistics
| Minimum | 10421.83 |
|---|---|
| 5-th percentile | 50137.775 |
| Q1 | 87356.0825 |
| median | 134018.5 |
| Q3 | 214174.6025 |
| 95-th percentile | 481902.003 |
| Maximum | 6141945.92 |
| Range | 6131524.09 |
| Interquartile range (IQR) | 126818.52 |
Descriptive statistics
| Standard deviation | 185822.1927 |
|---|---|
| Coefficient of variation (CV) | 1.006245582 |
| Kurtosis | 65.51499059 |
| Mean | 184668.8284 |
| Median Absolute Deviation (MAD) | 55750.125 |
| Skewness | 5.668235713 |
| Sum | 4.20273013e+10 |
| Variance | 3.452988731e+10 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 75169.83 | 4 | < 0.1% | |
| 54673.67 | 4 | < 0.1% | |
| 110771.67 | 4 | < 0.1% | |
| 88629 | 4 | < 0.1% | |
| 81272 | 4 | < 0.1% | |
| 57537.17 | 4 | < 0.1% | |
| 140845.33 | 3 | < 0.1% | |
| 112880.33 | 3 | < 0.1% | |
| 64628.33 | 3 | < 0.1% | |
| 73042.33 | 3 | < 0.1% | |
| Other values (220047) | 227546 | 85.9% | |
| (Missing) | 37378 | 14.1% |
| Value | Count | Frequency (%) | |
| 10421.83 | 1 | < 0.1% | |
| 10956.33 | 1 | < 0.1% | |
| 11688 | 1 | < 0.1% | |
| 11860.83 | 1 | < 0.1% | |
| 12041.42 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 6141945.92 | 1 | < 0.1% | |
| 5373670.92 | 1 | < 0.1% | |
| 5197037.17 | 1 | < 0.1% | |
| 4928414.67 | 1 | < 0.1% | |
| 4771183.92 | 1 | < 0.1% |
| Distinct count | 175367 |
|---|---|
| Unique (%) | 66.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 17.675088463689917 |
|---|---|
| Minimum | 0.0 |
| Maximum | 100.0 |
| Zeros | 14908 |
| Zeros (%) | 5.6% |
| Memory size | 2.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 6.977963141 |
| median | 12.79997659 |
| Q3 | 22.58064516 |
| 95-th percentile | 52.63157895 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range (IQR) | 15.60268202 |
Descriptive statistics
| Standard deviation | 16.4379872 |
|---|---|
| Coefficient of variation (CV) | 0.9300087651 |
| Kurtosis | 4.626284265 |
| Mean | 17.67508846 |
| Median Absolute Deviation (MAD) | 6.940601589 |
| Skewness | 1.963426228 |
| Sum | 4683191.439 |
| Variance | 270.207423 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 14908 | 5.6% | |
| 100 | 712 | 0.3% | |
| 20 | 313 | 0.1% | |
| 25 | 286 | 0.1% | |
| 33.33333333 | 266 | 0.1% | |
| 16.66666667 | 263 | 0.1% | |
| 14.28571429 | 218 | 0.1% | |
| 50 | 198 | 0.1% | |
| 12.5 | 190 | 0.1% | |
| 11.11111111 | 176 | 0.1% | |
| Other values (175357) | 247430 | 93.4% |
| Value | Count | Frequency (%) | |
| 0 | 14908 | 5.6% | |
| 0.02272727273 | 1 | < 0.1% | |
| 0.1114827202 | 1 | < 0.1% | |
| 0.1248439451 | 1 | < 0.1% | |
| 0.1402524544 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 100 | 712 | 0.3% | |
| 99.83974359 | 1 | < 0.1% | |
| 99.71791255 | 1 | < 0.1% | |
| 99.65337955 | 1 | < 0.1% | |
| 99.57386364 | 1 | < 0.1% |
| Distinct count | 5 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.109375 |
|---|---|
| Minimum | 0.75 |
| Maximum | 2.458333333333333 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 2.0 MiB |
Quantile statistics
| Minimum | 0.75 |
|---|---|
| 5-th percentile | 0.75 |
| Q1 | 0.75 |
| median | 0.7604166667 |
| Q3 | 1.171875 |
| 95-th percentile | 2.458333333 |
| Maximum | 2.458333333 |
| Range | 1.708333333 |
| Interquartile range (IQR) | 0.421875 |
Descriptive statistics
| Standard deviation | 0.5835901449 |
|---|---|
| Coefficient of variation (CV) | 0.5260530884 |
| Kurtosis | 0.7307521433 |
| Mean | 1.109375 |
| Median Absolute Deviation (MAD) | 0.01041666667 |
| Skewness | 1.488402739 |
| Sum | 293940 |
| Variance | 0.3405774573 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0.75 | 132480 | 50.0% | |
| 1.020833333 | 33120 | 12.5% | |
| 2.458333333 | 33120 | 12.5% | |
| 1.625 | 33120 | 12.5% | |
| 0.7708333333 | 33120 | 12.5% |
| Value | Count | Frequency (%) | |
| 0.75 | 132480 | 50.0% | |
| 0.7708333333 | 33120 | 12.5% | |
| 1.020833333 | 33120 | 12.5% | |
| 1.625 | 33120 | 12.5% | |
| 2.458333333 | 33120 | 12.5% |
| Value | Count | Frequency (%) | |
| 2.458333333 | 33120 | 12.5% | |
| 1.625 | 33120 | 12.5% | |
| 1.020833333 | 33120 | 12.5% | |
| 0.7708333333 | 33120 | 12.5% | |
| 0.75 | 132480 | 50.0% |
| Distinct count | 8 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 60351.0 |
|---|---|
| Minimum | 56912.0 |
| Maximum | 64324.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 2.0 MiB |
Quantile statistics
| Minimum | 56912 |
|---|---|
| 5-th percentile | 56912 |
| Q1 | 57756 |
| median | 59945.5 |
| Q3 | 63113.75 |
| 95-th percentile | 64324 |
| Maximum | 64324 |
| Range | 7412 |
| Interquartile range (IQR) | 5357.75 |
Descriptive statistics
| Standard deviation | 2846.855913 |
|---|---|
| Coefficient of variation (CV) | 0.04717164443 |
| Kurtosis | -1.621558691 |
| Mean | 60351 |
| Median Absolute Deviation (MAD) | 2938.5 |
| Skewness | 0.1383636684 |
| Sum | 1.599060096e+10 |
| Variance | 8104588.588 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 64324 | 33120 | 12.5% | |
| 63761 | 33120 | 12.5% | |
| 62898 | 33120 | 12.5% | |
| 60987 | 33120 | 12.5% | |
| 58904 | 33120 | 12.5% | |
| 58001 | 33120 | 12.5% | |
| 57021 | 33120 | 12.5% | |
| 56912 | 33120 | 12.5% |
| Value | Count | Frequency (%) | |
| 56912 | 33120 | 12.5% | |
| 57021 | 33120 | 12.5% | |
| 58001 | 33120 | 12.5% | |
| 58904 | 33120 | 12.5% | |
| 60987 | 33120 | 12.5% |
| Value | Count | Frequency (%) | |
| 64324 | 33120 | 12.5% | |
| 63761 | 33120 | 12.5% | |
| 62898 | 33120 | 12.5% | |
| 60987 | 33120 | 12.5% | |
| 58904 | 33120 | 12.5% |
| Distinct count | 8 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.6827791724977739 |
|---|---|
| Minimum | 0.5223373578996761 |
| Maximum | 0.730641178178307 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 2.0 MiB |
Quantile statistics
| Minimum | 0.5223373579 |
|---|---|
| 5-th percentile | 0.5223373579 |
| Q1 | 0.67283184 |
| median | 0.718343551 |
| Q3 | 0.7273311718 |
| 95-th percentile | 0.7306411782 |
| Maximum | 0.7306411782 |
| Range | 0.2083038203 |
| Interquartile range (IQR) | 0.05449933187 |
Descriptive statistics
| Standard deviation | 0.06823199467 |
|---|---|
| Coefficient of variation (CV) | 0.09993274168 |
| Kurtosis | 0.9576098655 |
| Mean | 0.6827791725 |
| Median Absolute Deviation (MAD) | 0.01073588595 |
| Skewness | -1.531094737 |
| Sum | 180909.1695 |
| Variance | 0.004655605096 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0.7306411782 | 33120 | 12.5% | |
| 0.7200176887 | 33120 | 12.5% | |
| 0.7272689972 | 33120 | 12.5% | |
| 0.6310078932 | 33120 | 12.5% | |
| 0.7166694134 | 33120 | 12.5% | |
| 0.5223373579 | 33120 | 12.5% | |
| 0.7275176958 | 33120 | 12.5% | |
| 0.6867731556 | 33120 | 12.5% |
| Value | Count | Frequency (%) | |
| 0.5223373579 | 33120 | 12.5% | |
| 0.6310078932 | 33120 | 12.5% | |
| 0.6867731556 | 33120 | 12.5% | |
| 0.7166694134 | 33120 | 12.5% | |
| 0.7200176887 | 33120 | 12.5% |
| Value | Count | Frequency (%) | |
| 0.7306411782 | 33120 | 12.5% | |
| 0.7275176958 | 33120 | 12.5% | |
| 0.7272689972 | 33120 | 12.5% | |
| 0.7200176887 | 33120 | 12.5% | |
| 0.7166694134 | 33120 | 12.5% |
| Distinct count | 8 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.113541666666668 |
|---|---|
| Minimum | 3.8916666666666666 |
| Maximum | 8.933333333333334 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 2.0 MiB |
Quantile statistics
| Minimum | 3.891666667 |
|---|---|
| 5-th percentile | 3.891666667 |
| Q1 | 4.741666667 |
| median | 5.716666667 |
| Q3 | 7.5375 |
| 95-th percentile | 8.933333333 |
| Maximum | 8.933333333 |
| Range | 5.041666667 |
| Interquartile range (IQR) | 2.795833333 |
Descriptive statistics
| Standard deviation | 1.719867468 |
|---|---|
| Coefficient of variation (CV) | 0.2813209693 |
| Kurtosis | -1.321304391 |
| Mean | 6.113541667 |
| Median Absolute Deviation (MAD) | 1.508333333 |
| Skewness | 0.3163534165 |
| Sum | 1619844 |
| Variance | 2.957944106 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 5.275 | 33120 | 12.5% | |
| 6.158333333 | 33120 | 12.5% | |
| 7.358333333 | 33120 | 12.5% | |
| 8.075 | 33120 | 12.5% | |
| 3.891666667 | 33120 | 12.5% | |
| 8.933333333 | 33120 | 12.5% | |
| 4.341666667 | 33120 | 12.5% | |
| 4.875 | 33120 | 12.5% |
| Value | Count | Frequency (%) | |
| 3.891666667 | 33120 | 12.5% | |
| 4.341666667 | 33120 | 12.5% | |
| 4.875 | 33120 | 12.5% | |
| 5.275 | 33120 | 12.5% | |
| 6.158333333 | 33120 | 12.5% |
| Value | Count | Frequency (%) | |
| 8.933333333 | 33120 | 12.5% | |
| 8.075 | 33120 | 12.5% | |
| 7.358333333 | 33120 | 12.5% | |
| 6.158333333 | 33120 | 12.5% | |
| 5.275 | 33120 | 12.5% |
| Distinct count | 8 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1007.8854166666666 |
|---|---|
| Minimum | 611.9166666666666 |
| Maximum | 1248.25 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 2.0 MiB |
Quantile statistics
| Minimum | 611.9166667 |
|---|---|
| 5-th percentile | 611.9166667 |
| Q1 | 892.0625 |
| median | 1053.5 |
| Q3 | 1184.291667 |
| 95-th percentile | 1248.25 |
| Maximum | 1248.25 |
| Range | 636.3333333 |
| Interquartile range (IQR) | 292.2291667 |
Descriptive statistics
| Standard deviation | 208.9448726 |
|---|---|
| Coefficient of variation (CV) | 0.2073101458 |
| Kurtosis | -0.8373471732 |
| Mean | 1007.885417 |
| Median Absolute Deviation (MAD) | 139.625 |
| Skewness | -0.6338115257 |
| Sum | 267049320 |
| Variance | 43657.9598 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1000.25 | 33120 | 12.5% | |
| 1176.583333 | 33120 | 12.5% | |
| 1207.416667 | 33120 | 12.5% | |
| 611.9166667 | 33120 | 12.5% | |
| 783.75 | 33120 | 12.5% | |
| 1248.25 | 33120 | 12.5% | |
| 928.1666667 | 33120 | 12.5% | |
| 1106.75 | 33120 | 12.5% |
| Value | Count | Frequency (%) | |
| 611.9166667 | 33120 | 12.5% | |
| 783.75 | 33120 | 12.5% | |
| 928.1666667 | 33120 | 12.5% | |
| 1000.25 | 33120 | 12.5% | |
| 1106.75 | 33120 | 12.5% |
| Value | Count | Frequency (%) | |
| 1248.25 | 33120 | 12.5% | |
| 1207.416667 | 33120 | 12.5% | |
| 1176.583333 | 33120 | 12.5% | |
| 1106.75 | 33120 | 12.5% | |
| 1000.25 | 33120 | 12.5% |
| Distinct count | 8 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 410836.1979166665 |
|---|---|
| Minimum | 255208.58333333328 |
| Maximum | 564448.75 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 2.0 MiB |
Quantile statistics
| Minimum | 255208.5833 |
|---|---|
| 5-th percentile | 255208.5833 |
| Q1 | 321154.3958 |
| median | 410493.3333 |
| Q3 | 500871.9167 |
| 95-th percentile | 564448.75 |
| Maximum | 564448.75 |
| Range | 309240.1667 |
| Interquartile range (IQR) | 179717.5208 |
Descriptive statistics
| Standard deviation | 109740.0191 |
|---|---|
| Coefficient of variation (CV) | 0.2671138026 |
| Kurtosis | -1.409802147 |
| Mean | 410836.1979 |
| Median Absolute Deviation (MAD) | 103413.4583 |
| Skewness | 0.002059248536 |
| Sum | 1.08855159e+11 |
| Variance | 1.204287178e+10 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 546020.1667 | 33120 | 12.5% | |
| 278995.5833 | 33120 | 12.5% | |
| 485822.5 | 33120 | 12.5% | |
| 335207.3333 | 33120 | 12.5% | |
| 382868.3333 | 33120 | 12.5% | |
| 438118.3333 | 33120 | 12.5% | |
| 564448.75 | 33120 | 12.5% | |
| 255208.5833 | 33120 | 12.5% |
| Value | Count | Frequency (%) | |
| 255208.5833 | 33120 | 12.5% | |
| 278995.5833 | 33120 | 12.5% | |
| 335207.3333 | 33120 | 12.5% | |
| 382868.3333 | 33120 | 12.5% | |
| 438118.3333 | 33120 | 12.5% |
| Value | Count | Frequency (%) | |
| 564448.75 | 33120 | 12.5% | |
| 546020.1667 | 33120 | 12.5% | |
| 485822.5 | 33120 | 12.5% | |
| 438118.3333 | 33120 | 12.5% | |
| 382868.3333 | 33120 | 12.5% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| Zipcode | RentPrice | Year | SizeRank | State | City | Metro | CountyName | HomePrice | Vacancy_Rate% | int_rate | med_hIncome | uspop_growth | unemplt_rate | newHouse_starts | resConstruct_spending | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 02333 | 1368.536 | 2011 | 8782.0 | MA | East Bridgewater | Boston-Cambridge-Newton | Plymouth County | NaN | 3.024027 | 0.75 | 57021.0 | 0.720018 | 8.933333 | 611.916667 | 255208.583333 |
| 1 | 02338 | 1311.076 | 2011 | 11179.0 | MA | Halifax | Boston-Cambridge-Newton | Plymouth County | 274920.17 | 3.116343 | 0.75 | 57021.0 | 0.720018 | 8.933333 | 611.916667 | 255208.583333 |
| 2 | 02339 | 1484.626 | 2011 | 8621.0 | MA | Hanover | Boston-Cambridge-Newton | Plymouth County | 415097.50 | 4.464646 | 0.75 | 57021.0 | 0.720018 | 8.933333 | 611.916667 | 255208.583333 |
| 3 | 02341 | 1266.816 | 2011 | 10079.0 | MA | Hanson | Boston-Cambridge-Newton | Plymouth County | NaN | 3.586322 | 0.75 | 57021.0 | 0.720018 | 8.933333 | 611.916667 | 255208.583333 |
| 4 | 02343 | 1524.006 | 2011 | 9640.0 | MA | Holbrook | Boston-Cambridge-Newton | Norfolk County | 247510.42 | 3.732901 | 0.75 | 57021.0 | 0.720018 | 8.933333 | 611.916667 | 255208.583333 |
| 5 | 02346 | 1310.016 | 2011 | 5289.0 | MA | Middleborough | Boston-Cambridge-Newton | Plymouth County | 264492.50 | 7.960256 | 0.75 | 57021.0 | 0.720018 | 8.933333 | 611.916667 | 255208.583333 |
| 6 | 02347 | 1307.736 | 2011 | 9579.0 | MA | Lakeville | Boston-Cambridge-Newton | Plymouth County | 309743.67 | 11.565968 | 0.75 | 57021.0 | 0.720018 | 8.933333 | 611.916667 | 255208.583333 |
| 7 | 02351 | 1399.926 | 2011 | 7293.0 | MA | Abington | Boston-Cambridge-Newton | Plymouth County | 279614.92 | 5.455122 | 0.75 | 57021.0 | 0.720018 | 8.933333 | 611.916667 | 255208.583333 |
| 8 | 02356 | 1753.956 | 2011 | 9084.0 | MA | Easton | Providence-Warwick | Bristol County | 371979.42 | 2.849920 | 0.75 | 57021.0 | 0.720018 | 8.933333 | 611.916667 | 255208.583333 |
| 9 | 02357 | 581.736 | 2011 | NaN | NaN | NaN | NaN | NaN | NaN | 0.000000 | 0.75 | 57021.0 | 0.720018 | 8.933333 | 611.916667 | 255208.583333 |
Last rows
| Zipcode | RentPrice | Year | SizeRank | State | City | Metro | CountyName | HomePrice | Vacancy_Rate% | int_rate | med_hIncome | uspop_growth | unemplt_rate | newHouse_starts | resConstruct_spending | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 264950 | 98134 | 1909.58 | 2018 | 28159.0 | WA | Seattle | Seattle-Tacoma-Bellevue | King County | 438970.00 | 13.580247 | 2.458333 | 64324.0 | 0.522337 | 3.891667 | 1248.25 | 564448.75 |
| 264951 | 98174 | NaN | 2018 | NaN | NaN | NaN | NaN | NaN | NaN | 0.000000 | 2.458333 | 64324.0 | 0.522337 | 3.891667 | 1248.25 | 564448.75 |
| 264952 | 98222 | NaN | 2018 | 30981.0 | WA | Olga | NaN | San Juan County | 580646.58 | 83.471074 | 2.458333 | 64324.0 | 0.522337 | 3.891667 | 1248.25 | 564448.75 |
| 264953 | 98233 | 1413.89 | 2018 | 7640.0 | WA | Burlington | Mount Vernon-Anacortes | Skagit County | 317426.75 | 4.853765 | 2.458333 | 64324.0 | 0.522337 | 3.891667 | 1248.25 | 564448.75 |
| 264954 | 98243 | 1302.94 | 2018 | NaN | NaN | NaN | NaN | NaN | NaN | 57.293233 | 2.458333 | 64324.0 | 0.522337 | 3.891667 | 1248.25 | 564448.75 |
| 264955 | 98279 | 1059.87 | 2018 | 23400.0 | WA | Olga | NaN | San Juan County | 552805.42 | 51.219512 | 2.458333 | 64324.0 | 0.522337 | 3.891667 | 1248.25 | 564448.75 |
| 264956 | 98280 | 993.85 | 2018 | 25265.0 | WA | Eastsound | NaN | San Juan County | 678499.00 | 51.329243 | 2.458333 | 64324.0 | 0.522337 | 3.891667 | 1248.25 | 564448.75 |
| 264957 | 98311 | 1533.50 | 2018 | 4981.0 | WA | Bremerton | Bremerton-Silverdale | Kitsap County | 314320.83 | 6.540162 | 2.458333 | 64324.0 | 0.522337 | 3.891667 | 1248.25 | 564448.75 |
| 264958 | 98326 | 778.99 | 2018 | 26185.0 | WA | Clallam Bay | Port Angeles | Clallam County | 150193.17 | 28.537736 | 2.458333 | 64324.0 | 0.522337 | 3.891667 | 1248.25 | 564448.75 |
| 264959 | 98332 | 1840.86 | 2018 | 6759.0 | WA | Gig Harbor | Seattle-Tacoma-Bellevue | Pierce County | 535136.75 | 7.340077 | 2.458333 | 64324.0 | 0.522337 | 3.891667 | 1248.25 | 564448.75 |